已经证明了深度增强学习(DRL)对自动驾驶和机器人等几种复杂的决策应用有效。然而,DRL众所周知,众所周知,其高样本复杂性及其缺乏稳定性。先验知识,例如作为专家演示,通常可以提供,但挑战杠杆以减轻这些问题。在本文中,我们提出了一般的增强模仿(GRI),这是一种新的方法,它与勘探和专家数据相结合的好处,并在任何偏离策略的RL算法上实施。我们制作一个简化假设:专家演示可以被视为完美的数据,其基础政策得到了不断的高奖励。基于此假设,GRI介绍了离线演示代理的概念。该代理发送了专家数据,并与来自在线RL探索代理商的经验同时处理。我们表明,我们的方法能够在城市环境中的基于视觉的自主驾驶的重大改进。我们进一步验证了具有不同偏离策略RL算法的Mujoco连续控制任务的GRI方法。我们的方法在Carla排行榜上排名第一,在Rails,以前的最先进,以17%越来越胜过世界。
translated by 谷歌翻译
近年来,深度加固学习(DRL)已经成功地进入了复杂的决策应用,例如机器人,自动驾驶或视频游戏。违规算法往往比其策略对应物更具样本效率,并且可以从存储在重放缓冲区中存储的任何违规数据中受益。专家演示是此类数据的流行来源:代理人接触到成功的国家和行动,可以加速学习过程并提高性能。在过去,已经提出了多种想法来充分利用缓冲区中的演示,例如仅在演示或最小化额外的成本函数的预先估算。我们继续进行研究,以孤立地评估这些想法中的几个想法,以了解哪一个具有最大的影响。我们还根据给予示范和成功集中的奖励奖金,为稀疏奖励任务提供了一种新的方法。首先,我们向来自示威活动的过渡提供奖励奖金,以鼓励代理商符合所证明的行为。然后,在收集成功的剧集时,我们将其在将其添加到重播缓冲区之前与相同的奖金转换,鼓励代理也与其先前的成功相匹配。我们的实验的基本算法是流行的软演员 - 评论家(SAC),用于连续动作空间的最先进的脱核算法。我们的实验专注于操纵机器人,特别是在模拟中的机器人手臂的3D到达任务。我们表明,我们的方法Sacr2根据奖励重新标记提高了此任务的性能,即使在没有示范的情况下也是如此。
translated by 谷歌翻译
Functionality and dialogue experience are two important factors of task-oriented dialogue systems. Conventional approaches with closed schema (e.g., conversational semantic parsing) often fail as both the functionality and dialogue experience are strongly constrained by the underlying schema. We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality and provide seamless dialogue experience. The conversational model interacts with the environment by generating and executing programs triggering a set of pre-defined APIs. The model also manages the dialogue policy and interact with the user through generating appropriate natural language responses. By allowing generating free-form programs, Dialog2API supports composite goals by combining different APIs, whereas unrestricted program revision provides natural and robust dialogue experience. To facilitate Dialog2API, the core model is provided with API documents, an execution environment and optionally some example dialogues annotated with programs. We propose an approach tailored for the Dialog2API, where the dialogue states are represented by a stack of programs, with most recently mentioned program on the top of the stack. Dialog2API can work with many application scenarios such as software automation and customer service. In this paper, we construct a dataset for AWS S3 APIs and present evaluation results of in-context learning baselines.
translated by 谷歌翻译
Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that's easy, light-weight and universal in text classification: a combination of a simple compressor like gzip with a $k$-nearest-neighbor classifier. Without any training, pre-training or fine-tuning, our method achieves results that are competitive with non-pretrained deep learning methods on six in-distributed datasets. It even outperforms BERT on all five OOD datasets, including four low-resource languages. Our method also performs particularly well in few-shot settings where labeled data are too scarce for DNNs to achieve a satisfying accuracy.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Scoring rules promote rational and good decision making and predictions by models, this is increasingly important for automated procedures of `auto-ML'. The Brier score and Log loss are well-established scoring rules for classification and regression and possess the `strict properness' property that encourages optimal predictions. In this paper we survey proposed scoring rules for survival analysis, establish the first clear definition of `(strict) properness' for survival scoring rules, and determine which losses are proper and improper. We prove that commonly utilised scoring rules that are claimed to be proper are in fact improper. We further prove that under a strict set of assumptions a class of scoring rules is strictly proper for, what we term, `approximate' survival losses. We hope these findings encourage further research into robust validation of survival models and promote honest evaluation.
translated by 谷歌翻译
Everting, soft growing vine robots benefit from reduced friction with their environment, which allows them to navigate challenging terrain. Vine robots can use air pouches attached to their sides for lateral steering. However, when all pouches are serially connected, the whole robot can only perform one constant curvature in free space. It must contact the environment to navigate through obstacles along paths with multiple turns. This work presents a multi-segment vine robot that can navigate complex paths without interacting with its environment. This is achieved by a new steering method that selectively actuates each single pouch at the tip, providing high degrees of freedom with few control inputs. A small magnetic valve connects each pouch to a pressure supply line. A motorized tip mount uses an interlocking mechanism and motorized rollers on the outer material of the vine robot. As each valve passes through the tip mount, a permanent magnet inside the tip mount opens the valve so the corresponding pouch is connected to the pressure supply line at the same moment. Novel cylindrical pneumatic artificial muscles (cPAMs) are integrated into the vine robot and inflate to a cylindrical shape for improved bending characteristics compared to other state-of-the art vine robots. The motorized tip mount controls a continuous eversion speed and enables controlled retraction. A final prototype was able to repeatably grow into different shapes and hold these shapes. We predict the path using a model that assumes a piecewise constant curvature along the outside of the multi-segment vine robot. The proposed multi-segment steering method can be extended to other soft continuum robot designs.
translated by 谷歌翻译
Extreme wildfires continue to be a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity (i.e., occurrence and spread) suggest that wildfires are likely to be highly impacted by climate change. In order to facilitate appropriate risk mitigation, it is imperative to identify the main drivers of extreme wildfires and assess their spatio-temporal trends, with a view to understanding the impacts of global warming on fire activity. To this end, we analyse the monthly burnt area due to wildfires over a region encompassing most of Europe and the Mediterranean Basin from 2001 to 2020, and identify high fire activity during this period in eastern Europe, Algeria, Italy and Portugal. We build an extreme quantile regression model with a high-dimensional predictor set describing meteorological conditions, land cover usage, and orography, for the domain. To model the complex relationships between the predictor variables and wildfires, we make use of a hybrid statistical deep-learning framework that allows us to disentangle the effects of vapour-pressure deficit (VPD), air temperature, and drought on wildfire activity. Our results highlight that whilst VPD, air temperature, and drought significantly affect wildfire occurrence, only VPD affects extreme wildfire spread. Furthermore, to gain insights into the effect of climate change on wildfire activity in the near future, we perturb VPD and temperature according to their observed trends and find evidence that global warming may lead to spatially non-uniform changes in wildfire activity.
translated by 谷歌翻译
Reduced order modeling methods are often used as a mean to reduce simulation costs in industrial applications. Despite their computational advantages, reduced order models (ROMs) often fail to accurately reproduce complex dynamics encountered in real life applications. To address this challenge, we leverage NeuralODEs to propose a novel ROM correction approach based on a time-continuous memory formulation. Finally, experimental results show that our proposed method provides a high level of accuracy while retaining the low computational costs inherent to reduced models.
translated by 谷歌翻译
Producing high-quality forecasts of key climate variables such as temperature and precipitation on subseasonal time scales has long been a gap in operational forecasting. Recent studies have shown promising results using machine learning (ML) models to advance subseasonal forecasting (SSF), but several open questions remain. First, several past approaches use the average of an ensemble of physics-based forecasts as an input feature of these models. However, ensemble forecasts contain information that can aid prediction beyond only the ensemble mean. Second, past methods have focused on average performance, whereas forecasts of extreme events are far more important for planning and mitigation purposes. Third, climate forecasts correspond to a spatially-varying collection of forecasts, and different methods account for spatial variability in the response differently. Trade-offs between different approaches may be mitigated with model stacking. This paper describes the application of a variety of ML methods used to predict monthly average precipitation and two meter temperature using physics-based predictions (ensemble forecasts) and observational data such as relative humidity, pressure at sea level, or geopotential height, two weeks in advance for the whole continental United States. Regression, quantile regression, and tercile classification tasks using linear models, random forests, convolutional neural networks, and stacked models are considered. The proposed models outperform common baselines such as historical averages (or quantiles) and ensemble averages (or quantiles). This paper further includes an investigation of feature importance, trade-offs between using the full ensemble or only the ensemble average, and different modes of accounting for spatial variability.
translated by 谷歌翻译